Effect of signal timing on vehicles’ near misses at intersections

Driving characteristics often vary between the different states of the signal. During red and yellow phase, drivers tend to speed up and reduce the following distance which in turn increases the possibility of rear end crashes. Intersection safety, therefore, relies on the correct modelling of signal phasing and timing parameters, and how drivers respond to its changes. This paper aims to identify the relationship between surrogate safety measures and signal phasing. Unmanned aerial vehicle (UAV) video data has been used to study a major intersection. Post encroachment time (PET) between vehicles was calculated from the video data as well as speed, heading and relevant signal timing parameters such as all red time, red clearance time, yellow time, etc. Random parameter ordered logit model was used to model the relationship between PET and signal timing parameters. Overall, the results showed that yellow time and red clearance time is positively related to PETs. The model was also able to identify certain signal phases that could be a potential safety hazard and would need to be retimed by considering the PETs. The odds ratios from the models also indicate that increasing the mean yellow and red clearance times by one second can improve the PET levels by 10% and 3%, respectively.

Driver behavior is an important element of road safety which indicates how an individual vehicle behaves due to the driving scene and surrounding environment. The presence of signalized intersections can affect how a person drives. In this study, the authors have used a quantitative surrogate safety measure to model the driving behavior from a safety standpoint: post encroachment time (PET) which can be considered as the temporal gap between two vehicles. Low PET indicates that the lagging vehicle is following too closely which can result in a rear end crash. Therefore, accepting low gaps can be an indicator of risky driving behavior. The authors have investigated whether this behavior can be modelled with respect to signal timing.
Moreover, traffic analysis from a safety point of view has largely relied on crash data. Various statistical methods and machine learning methods have been implemented to understand proactive natures of crash enabling real time prediction of these events. Countermeasures have been developed based on accident data as well. However, crash data can be rare events and there are notable shortcomings of these types of data such as incorrect reasoning, subjectivism, inaccurate data, etc. 1,2 . Moreover, the specific reasoning to a crash can often be factors other than roadway characteristics and traffic features which cannot be modelled using the prediction algorithms in the literature. On the other hand, conflict events are more common and therefore, can help better to understand design flaws of roadway as well as traffic conditions that impacts conflicts. Several previous studies have definitively proven conflict analysis as an alternative to crash analysis with similar results [2][3][4][5] . Several metrics has thus been developed to measure conflict such as Time-to-collision (TTC) 6  The surrogate safety measures are usually dependent on exact localization of road users. For example, to calculate TTC, initial location and velocity would be needed. This requires precise GPS locations. An effective way to study an intersection would be with the help of an Unmanned Aerial Vehicle (UAV) that can be then used to extract accurate trajectories at the centimeter level. These are a better alternative than roadside cameras which have distortion of localization at camera edges. UAVs are also known for easy maneuvering, flexibility, and low cost. UAVs have become an emerging video analysis solution at the transportation level in the recent years. It is often augmented with radar and infrared cameras that can provide a bird's eye view of an intersection including the approaches. In this study, an intersection was analyzed with respect to PET from the data available through UAV. The signal timing at that instant was also captured. The purpose of this study was to analyze the interaction of safety events and relate it to the signal states.

Literature review
Traffic safety at intersections has been shown to be dependent on signal timing at that intersection. For example, altering signal phases can better or worsen intersection safety 9 . Several studies have found that there is a direct relation between signal timings and crashes. After any retiming of signals, a crash reduction factor is also estimated but few studies have also reported that there were no significant relationships 10 . Guo, Wang 11 showed that adaptive intersections experienced fewer crashes than isolate ones. The study was extensive and included over 170 intersections in Florida, USA but the results were based on signal timing sheets only since real traffic data was not available. Midenet, Saunier 12 evaluated signal safety by measuring the exposure to lateral collisions using video feed. Approach level data from traffic detectors including speed, volume was found to be associated with significant crash risk 13 . It was also reported in this study that longer green time for left turn, higher green ratio can improve the safety at intersections. The main limitation of all the studies is that crash events are usually rare and therefore, these studies would only rely on the spatial relationship between crash events and traffic parameters. It has been shown in several studies that the temporal relationship need to be included as well since traffic parameters and signal timing would vary largely throughout the day and even across days [14][15][16] . Moreover, there are notable shortcomings of these types of police reported crash data such as incorrect reasoning, subjectivism, inaccurate data, etc. 1, 2 . Additionally, there is the moral dilemma of waiting for fatalities to happen before taking an appropriate countermeasure making it a reactive approach. Crash events are also rare, and it takes a long time to study a location or conduct a before-after study. Surrogate safety measures provide an alternate and proactive methodology that does not require much time and solves the moral dilemma to a great extent. Several studies have also shown that it can significantly correlate to crashes and can mostly be used as an alternative [2][3][4][5]17 .
Using surrogate safety measures for signal timing was first proposed by Stevanovic, Stevanovic 18 . The study proposed the integration of optimization and surrogate safety measure assessment at the microscopic level considering both the safety and efficiency. Network wide optimization was also studied in recent time 19 . This work also incorporates simulation and surrogate safety measures to find optimal solution using a model calibrated from real-world data. The influence of signal phasing on the safety and traffic smoothness was also studied 20,21 . It was also shown that optimization of the left turn waiting zones would improve capacity without degrading traffic flow 22 while Lin and Huang 23 improved both at signal coordination level across multiple intersections. All the studies have relied on simulation software such as VISSIM to model traffic signals and safety. While some studies calibrate the models based on real traffic flow, the ground data can be significantly different than the simulation. This work addresses this research gap and uses real-world data from UAVs to evaluate signal timing based on Post Encroachment Time (PET). The main objective of this work was to evaluate the impact of all-red time, red clearance time, red time, yellow time and green time on the surrogate safety measures based on real-world data. These can also help relevant authorities to understand intersection traffic with respect to PET and gain insight whether the signal timing need optimization or not. Moreover, the odds ratio was also calculated to show that one second increase of yellow and red clearance time will help to increase the PET level thereby improving the safety condition of the intersection.

DATA preparation
Trajectory data. The vehicle trajectories provided by the CitySim dataset 24 were utilized to identify, process, and analyze PET conflicts in this study. The CitySim dataset is composed of top-view drone-video-based vehicle trajectories. The authors identified vehicle trajectories using mask-RCNN and subsequently extracted and exported rotation-aware bounding boxes. The process involves an extensive five-step pipeline: video stabilization, object filtering, video stitching, detection and tracking, and enhanced error filtering. Video stabilization was obtained through Scale-Invariant Feature Transform (SIFT) algorithm. Gaussian-mixture-based algorithm was used to filter background objects. Afterwards, object detection algorithm Mask R-CNN was used to obtain rotating bounding boxes. Finally, any remaining errors were filtered using human-in-the-loop. Each frame was checked manually to ensure the exactness of the bounding boxes.
The dataset contains vehicle trajectories sampled at 30 frames per second. For each trajectory point, the dataset provides four bounding box positions, speed, and heading. In this work, the University@Alafaya intersection location was selected for development, evaluation, and analysis. The intersection geometry is illustrated in Fig. 1. It is a signalized intersection between Alafaya Trail (9 lanes) and University Boulevard (9 lanes). The utilized trajectories were extracted from a video recorded on a weekday between 5:40 PM and 6:40 PM (afternoon peak). A total of 4871 vehicles passed through the intersection during that period of time. The different phases for each traffic direction are also shown in Fig. 1. There are three through lanes for each of the phases 2,4,6 and 8 while two left turning lanes for phases 1,3,5 and 7. The approach 4 does not have any exclusive right turn lanes while the other through phases all have an exclusive right turn lane.

Post encroachment time (PET).
Post Encroachment Time is a conflict indicator that serves as a surrogate safety measure. Figure 2 depicts an example PET conflict between two vehicles at a single timestep. PET measures the period of time between a leading vehicle leaving a particular location and a lagging vehicle arriving at the same location. In this scenario, the location where both vehicles interact is dubbed the conflict zone. The PET conflict indictor generates a sequence of PET values that describe the serial interaction between two vehicle trajectories under observation. A PET value exists in the generated PET sequence as long as the lagging vehicle remains in a conflict zone. Otherwise, the PET value at a timestep where no encroachment occurs is undefined.
In this research effort, the PET values were computed using the rotation-aware vehicle bounding boxes provided by the CitySim Dataset. At each timestep, and for each possible pair of vehicles, the PET value was measured between the moment a lagging vehicle bounding box intersects with a leading vehicle's previous bounding box location (i.e., the lagging vehicle intersects with the conflict zone as described in Fig. 2 www.nature.com/scientificreports/ vehicles, an output PET sequence that describes their interaction was generated. The selected timestep was 1/3 s (3 Hz). PET values under 5 s were recorded. Table 1 describes the PET conflicts extracted from the study area. When sampled at 3 Hz, a total of 193,000 PET conflicts under 5 s were captured in the study area. Additionally, Table 1 reports the minimum PETs (min-PETs). The minPET is defined as the minimum PET recorded between 2 vehicle trajectories. It describes the single most hazardous moment between unique vehicle pairs. Table 1 indicates that, during the recorded time, 717 unique vehicle pairs recorded a minPET under 1 s, and 7345 unique vehicle pairs encountered a minPET conflict under 5 s.
Utilizing the vehicle bounding boxes for PET calculation is not common within previous research efforts. Instead, most previous work relied on the trajectories of the center-point-based conflict identification. As illustrated in Fig. 2, the vehicle geometry is essential for robust PET measurement. Center points misrepresent vehicle geometries and lead the conflict identification algorithm to neglect conflicts or underestimate their severity 25 .   www.nature.com/scientificreports/ Figure 3 compares heatmap plots of minPETs recorded in the study intersection using bounding boxes versus center points. It can be clearly observed that the bounding box approach was able to recall more conflicts than the center point method. For a minPET < 1.0 s, the center point method identified 141 compared to 717 conflicts captured by the bounding box. Similarly, for a minPET maximum threshold of 3.0 s, the center point and bounding box methods identified 3637 and 4365 conflicts, respectively. Figure 3 clearly demonstrates the superiority and robustness of the bounding box approach. Furthermore, it indicates that the center point misdetection rate is proportional to the conflict severity, meaning that center-point-based computations fail to capture the most hazardous traffic conflicts. Five different levels of PET were chosen based on past literature. In a study conducted by Zheng, Ismail 26 , it was found that a PET threshold of 1.5 s exhibited the strongest correlation between crashes and conflicts. Results also indicated that PET thresholds of 1.5 s, 2 s, 2.5 s, and 3 s were all significantly correlated with crashes. Peesapati, Hunter 27 found through their study using CDF and absolute number of PETs that values less than 1 s and 1.5 s were the most related to crashes, and PETs less than 3 s showed a degrading Pearson Coefficient. Another study by Zheng and Sayed 28 chose a threshold value of 4 s to analyze extreme values of conflicts only. Based on previous research, PET values less than 1 s or 2 s are considered critical, while those between 2 and 4 s are intermediate, and those between 4 and 5 s are mild conflicts. PETs were preferred over other conflict measures, such as TTC, because TTC assumes a straight-line collision course and is not suitable for intersections with left and right turn motions. PETs, on the other hand, can capture angle/crossing conflicts accurately 29 .
All the different datasets involving PET, speed, heading, and signal timing were merged together to obtain the final dataset. The descriptive statistics of the different variables as well as brief explanation of each variable in the final dataset are shown in Table 2. The various signal timing such as red, green, yellow, etc. are modelled as a countdown timer to understand the impact of the time remaining of a phase on PETs.
A sample case of changing PETs towards the end of a cycle is shown in Fig. 4. The PETs between interacting vehicles are shown in the figure. The lower the PET, the redder is the bounding box indicating high severity. It can be noted that as the phase turns green the vehicles start to move with PETs between 1.5 to 2 s. As the phase turns from yellow to red, the PET even lowers to 0.8 s as drivers try to clear the intersection.

Model
Random parameter ordered logit model. Random parameters logit model is a logit model for which the parameters are assumed to vary from one case to another. It is therefore a model that takes the heterogeneity of the population into account. In this study five levels of PET were considered.
We www.nature.com/scientificreports/ where U ij is a function determining the PET level i on individual PET for observaions, X ij is a vector of explanatory variables; βi is a vector of estimable parameters for outcome i which may vary across observations, and ε ij is the error term which is assumed to be generalized extreme value distributed (McFadden, 1981).
In order to develop random parameter models, we consider the following latent process as described by Sarrias Mauricio, 2016 where y * it is a latent (unobserved) process for individual i in period t, x it is a vector of covariates, and ǫ it is the error term.
Note that the conditional probability density function (PDF) of the latent process f y * it x it, β i is determined once the nature of the observed y it and the population PDF of ǫ it is known. If y it is binary and ǫ it is distributed as normal, then the latent process becomes the traditional probit model; if y it is an ordered categorical variable and ε it is logistically distributed, then the traditional ordered logit model arises. Formally, the PDF for binary, ordered, and Poisson model are, respectively For the binary and ordered models, F (·) represents the cumulative distribution function (CDF) of the error term, which F(ǫ) = �(ǫ) for probit and F(ǫ) = Ŵ(ǫ) for logit. For the ordered model, κ j represents the threshold for alternative j = 1, . . ., J − 1, such that κ 0 = −∞ and κ 0 = ∞. www.nature.com/scientificreports/ In the structural model given by Eq. (1), we allow the vector coefficient β i to be different for each individual in the population. In other words, the marginal effect on the latent dependent variable is individual-specific. Nevertheless, we do not know how these parameters vary across observations. All we know is that they vary according to the population PDF g(β i |θ|) where θ represents the moments of the distribution such as the mean and the variance, which must be estimated. A fully parametric model arises once g(β i |θ|) and the distribution of ǫ are specified.
For simplicity in notation, assume that the coefficient vector is independent normal distributed, so that β k ∼ N β k , σ 2 k for the k-th element in i. Note that each coefficient can be written as β ki = β k + σ k ω i where w i ∼ N(0, 1), or in vector form as β i = β + Lω i , where L is a diagonal matrix that contains the standard deviation parameters, σ k . All the information about the individual heterogeneity for each individual attribute is captured by the standard deviation parameter σ k . If σ k = 0, then the model is reduced to the fixed parameter model, but if it is indeed significant then it would reveal that the relationship between x itk and y it is heterogeneous and focusing just on the central tendency k alone would veil useful information. It is useful to note that the random effect model is a special case in which only the constant is random.
Some measures of goodness-of-fit including Log-Likelihood and Akaike Information Criterion (AIC) were used to find the best fitted model. The best fitted model that displayed the maximum value of the log-likelihood function was chosen to obtain the parameter estimates that made the data most likely. AIC value was used to compare the performances of the GLMs. The preferred model is the one with the minimum AIC value. The AIC value can be evaluated using: where k = The number of estimated parameters in the model, L = The maximum value of the likelihood function for the model.  www.nature.com/scientificreports/ The Bayesian Information Criterion (BIC) values were also calculated to conclude the best model that describes the relationship between each crash type and the explanatory variables. The AIC introduces a penalty term that is represented by the parameter number in the AIC. The BIC introduces the penalty term as a combination between the parameter number and sample size 30 . Random parameter ordered logit model with observed heterogeneity. This extension of the ordered logit model, allows the coefficients to be corelated. The covariance matrix of the random parameters can be shown as LL T = , where L is a lower triangular matrix. If is a matrix of parameters, s i is a vector of covariates not varying in time and ω ∼ N(0, 1) , the parameter vector, its mean and covariance can be written as

Results
The results from the Random Parameter Logit Model and that with heterogeneity are shown in Table 3 and the conclusions are presented in the subsections. It is important to note here that the signal times are modelled as a countdown timer. For example, a yellow time of 4 s means that the signal state is currently yellow and has 4 s remaining. The reason the authors decided to model the signal timings as a countdown timer is that most of the times the drivers would try to speed up or slow down to comply with the end of the signal timings. Since AIC is used for selecting prediction model and BIC is used for model explanation 31 , the authors have chosen the appropriate model based on BIC values. The model with heterogeneity in means had better BIC values except for yellow time. Thus, the random parameter ordered logit model is suggested for yellow time and that with heterogeneity in means is suggested for all other signal times. The signs of the different variables were identical in both the models.
It was seen that the random parameters models performed better than the fixed effects models as the AIC and BIC values of the random parameter models were much lower than those of the fixed effects models. The study evaluated the effect of signal times on PET levels. Thus, five models for different signal times (yellow, all red, red clearance, red and green) were performed to ascertain its effects on PETs. As mentioned before, PETs less than 1 was indicated as level 1, and the data had five levels of PET, with PET values ranging from 0.3 s to 4.97 s. Other independent variables in the models were the different phases for the signal cycle, phase 1 through 8, where phases 2,4,6,8 were for through and right turn and phases 1,3,5,7 were left turning ones. In summary, the negative signs of the coefficients in Table 3 reduce PET levels (increases conflict severity) while positive signs increase PET levels (decreases conflict severity).

PET for intersection. PET values also have different effects when vehicles are inside the intersection verses
when they were at the approach. It can be seen that for the models of red clearance, red and green timings, the intersection indicator variable was significant and the coefficient being negative indicates that in intersections the PET levels are in general low meaning the vehicles have tendency to maintain small gaps between them, which can in turn be a risky situation.
Yellow time vs PET. The overall yellow time is positively related to the different PETs. The lower the yellow time, the lower the PET which shows that the vehicles tend to follow each other closely towards the end of the yellow phase. The variable phase 5 shows that when the yellow for this phase is active, there are low PETs. This essentially indicates a probable issue with the length of the yellow time. The other phases that came out to be significant have the opposite relationship and can be interpreted to be safer.

All red time vs PET.
All red time is negatively correlated to PET. This shows that the vehicles that enter the intersection at the end of yellow have lower PET since they are essentially trying to clear the intersection. Together with the yellow time and all red time, it can be concluded that there are lower PETs at the boundary of yellow and all red time. The variables phase 1 and phase 3 have negative sign meaning that when the all red of these phases are active, there are lower PETs resulting in an unsafe state. This also helps to conclude the visualization in Fig. 2, where we see a snapshot of the traffic state for phase 1 at all red time of 1.8 s.
Red clearance time vs PET. The red clearance time was not significant in the model but from the individual red clearance time per phase it is noted that the relationship is negative meaning that each of the clearance times experience lower PET. This can also be noted as a potential safety condition that will require careful signal timing optimization. Almost all phases except phase 1, were found to be significant.

Red time, green time vs PET.
It can be seen that increase green signal times of a cycle have positive signs indicating potential for increasing PET level. Which in turn signifies that increase in these timings have potential to increase PET values between vehicles and increasing safety by reducing probability of conflict leading to rear end crashes. On the other hand, increase in red times, influences the PET levels to decrease meaning that the www.nature.com/scientificreports/ increase in these timings have potential to decrease the PET. This is expected since as red time is increased, the vehicles are stopped at the approach and therefore have no conflicts.
Speeding proportion vs PET. Increase of speed of the vehicle also leads to an increase in PET. The speeding proportion is calculated for the leading vehicle and as such once this vehicle speeds, the distance between interacting vehicles increases thus increasing PET.
Odds ratio. Standard interpretation of the ordered logit coefficient is that for a one unit increase in the predictor, the response variable level is expected to change by its respective regression coefficient in the ordered log-  www.nature.com/scientificreports/ odds scale while the other variables in the model are held constant. Thus, we calculate Odds ratio. Odds Ratios can be obtained by exponentiating the ordered logit coefficients, e coef . For a one unit change in the predictor variable, the odds for cases in a group that is greater than k versus less than or equal to k are the proportional odds times larger, where k is the level of the response variable. Therefore, as the coefficients of all red and red timings were negative, with one unit increase in all red and red (when other variables are constant), the odds of low PET meaning high risk values are 1.188 and 0.986 times larger respectively. So, for yellow, red clearance and green timings, as these coefficients are positive, with one unit increase in yellow, red clearance and green timings (when other variables are constant in each of the models), the odds of values in high PET meaning low risk levels are 1.109, 1.034 and 1.015 times larger respectively. This leads to an important conclusion regarding improving the PETs at intersections. Increasing the yellow, red clearance and green timings would lead to better PETs than increasing all red and red time. Since the data collected was during the afternoon peak, it might also be impactful to increase yellow and red clearance time for these periods only rather than for the entire length of day.
Heterogeneity analysis. From Table 3, yellow time was found to be significant normally distributed random parameters with mean of 0.014 and standard deviation of 0.363. Therefore, larger yellow times are associated with higher PET levels meaning less critical conflicts. Intersection variable is also normally distributed with a mean of 7.445 and standard deviation of 4.824. This means that vehicles within an intersection are more likely to be associated with higher levels of PET during all red time. An intersection is a less critical conflict location than an approach during all red time. The same can be said about red clearance time since the intersection variable was also statistically significant with a mean of 1.8 and standard deviation of 1.6. Red time was also significant with a mean of − 0.014 and standard deviation of 0.007. Therefore, more critical conflicts are noticed at the start of red time. This is expected since at the start of red time, the vehicles slow down and come to a complete stop. Therefore, no unsafe PET levels are observed at the end of red time. The volume variable was also significant during red time (mean − 0.023, standard deviation 0.014). Higher volumes during red time are related to lower PET levels giving rise to more critical conflicts. Similarly, higher volume during green time also results in more critical conflicts (mean − 0.003 and standard deviation 0.006). The volume was not significant during the yellow, red clearance and all red time but was significant during the longer durations such as green and red time.

Conclusions
In summary, this paper proposes the use of UAV vehicle trajectory data to identify the relationship between signal timing and PET. One hour of UAV data was collected to obtain PETs, speeding, heading and signal phasing and timing. The PETs were calculated using rotating bounding boxes and also using the back of the leading vehicle and front of the lagging vehicle which gives a much accurate PET than that using center points of the vehicles. It was then modelled using Random Parameter Ordered Logit Model with heterogeneity in means. The PET values were divided into five classes. Results from the model showed that the yellow time and red clearance time are negatively related with PET while all red time, red time and green time are positively related to PET. The odds ratio indicated that it would be possible to increase the PET levels and thereby improve safety by only increasing the yellow time and red clearance time by 1 s. The practical application of this study can be achieved in signal timing optimization. Usually, the various times are decided based on traffic volume and intersection geometry only. Safety remains largely disregarded. Using the results from this study, the signal timing can be optimized based on safety parameters also so that less conflicts are expected. This study can be used to understand the safety of an intersection in terms of signal timing. Following distance was calculated to indicate aggressive driving behavior and how it varies with the different phases. The results showed that drivers tend to follow closely during the end of yellow and during all red time. It can also assist in determining if signal retiming is warranted to help improve safety. Only an hour of video data processing has the potential to provide these insights to relevant authorities. Future studies can focus on the traffic dynamic features as well as different types of intersections to understand the relationship between surrogate safety measures and signal timing. Moreover, other measures of conflicts such as time to collision (TTC), modified time to collision (MTTC), deceleration rate to avoid a crash (DRAC), etc. can also be studied.